Introductie in R, Quarto en RStudio

Gerko Vink

Methodology & Statistics @ Utrecht University

Laurence Frank

Methodology & Statistics @ Utrecht University

2 Jun 2025

Topics of this lecture

  1. Introduction to R and RStudio

    • Working with packages
    • Getting help in R
  2. Reproducible data analysis with Quarto

  3. Organise your work with R Projects

  4. R data objects

    • vectors
    • matrices
    • data frames
    • lists
    • factors

Introduction to R and RStudio

What is R

  • R is a language and environment for statistical computing and for graphics
  • Based on the object-oriented language S (1975)
  • 100% free software
  • Managed by the R Foundation for Statistical Computing, Vienna, Austria.
  • Community-driven:
    • More than 10.000 packages developed by community
    • New packages are constantly being developed
    • New features are constantly being added to existing packages

Fun fact about R:

Every version of R that is released is named after a topic in a Peanuts comic. The R version 4.3.3 (2024-02-29) is called “Angel Food Cake”.

R version Charlie Brown cartoon

What is RStudio?

  • RStudio is an Integrated Development Environment (IDE)
  • RStudio has all functionality in one place and makes working with R much easier.
  • Use RStudio to:
    • Edit scripts, Run scripts
    • Manage your code with highlighting
    • Navigate files, organize projects
    • Utilize version control (e.g. Github)
    • View static and interactive graphics
    • Create different file types (RMarkdown, Shiny apps)
    • Work with different languages (Python, JavaScript, C++, etc.)

The 4 panes in RStudio

Working with R packages

R Packages: base installation

  • When you start RStudio and R only the base packages are activated: the basic installation with basic functionality.

  • There are almost 20.000 packages that have been developed by R users all over the world. See the Comprehensive R Archive Network (CRAN)

  • Not efficient to have all these packages installed every time you use R. Install only the packages you want to use.

See which packages are active

  • Use sessionInfo() to see which packages are active.

  • This is how the basic installation looks like:

Overview of installed packages

An overview of the packages you have installed, see the tab “Packages” in the output pane:

How to work with packages

Packages are to R what apps are on your mobile phone.

  • When you want to use a package for the first time, you have to install the package.

  • Each time you want to use the package, you have to load (activate) it.

Opening and closing packages

To load a package use the following code (similar to opening an app on your phone):

library(ggplot2)

To close a package use (similar to closing an app on your phone):

detach(ggplot2)

Reproducible data analysis with Quarto

Why work with Quarto?

The need to combine code and text and to document all the steps to make reproducible (scientific) reports of data analyses.


HTML5 Icon

Why work with Quarto?

It is efficient. Generate and update reports in all kinds of formats:

Source: What is R Markdown? Video RStudio

Demo RStudio and Quarto

Writing text in Quarto

See the R Markdown Cheat Sheet for a complete list of options.

Writing text in Quarto

Writing code in Quarto

Code chunks start with {r } (for R code). You can give code chunks names (here cars).

This is how the result looks like in the rendered html document. Display of both R code and results:

Code chunk options

You can choose to hide the R code with echo=FALSE in the chunk header:

See the Quarto reference page for a complete list of chunk options.

Getting help with Quarto

Quarto is the evolution of R Markdown. In RStudio you can find the Markdown Reference:

R Studio Projects

Use RStudio Projects

Every time you start a new (data analysis) project, make it a habit to create a new RStudio Project.

Because you want your project to work:

  • not only now, but also in a few years;
  • when the folder and file paths have changed;
  • when collaborators want to run your code on their computer.


RStudio Projects create a convention that guarantees that the project can be moved around on your computer or onto other computers and will still “just work”. It creates relative paths (no more broken paths!).

Example: Data analysis RStudio project

All data, scripts, and output should be stored within the project directory.

Every time you want to work on this project: open the project by clicking the .Rproj file.

HTML5 Icon

R data objects

Using R as a calculator

The simplest thing you could do with R is do arithmetic:

100 + 10
## [1] 110
9 / 3
## [1] 3

Using R as a calculator

Here are the common signs to use in arithmetic:

arithmetic sign
Addition +
Subtraction -
Multiplication *
Division /
Exponents. ^ or **

Assignment operator

In reading materials you have learned about the <- assignment operator.

Here x is assigned the value 8

x <- 8 

If you run this code:

  • a new value will be saved in your work space (piece of memory)
  • In the environment pane, the tab “Environment”, you will see x under “Values” followed by 8

Printing values

x <- 8 

Assigning does not print the value 8.

If you want to print to value 8 you can do:

x 

# or:

print(x)

R is an object-oriented programming language

When you assign values with the assignment operator <- you create an R object.

Objects can contain data, functions or even other objects.

The most commonly used objects are:

  • vector
  • matrix
  • data frame
  • list
  • formulas and models

Vectors

Vector

A vector is a list of values (data). The simplest object in R is a vector with one element:

x <- 8 

Vector generating functions

The function c(...) collects elements in a vector

v <- c(1, 2, 3, 4, 5)
  • seq(from, to) or : generate a sequence of integers
seq(from = 1, to = 5)
[1] 1 2 3 4 5
1:5
[1] 1 2 3 4 5
  • rep(..., times) repeats ... a number of times
rep(1:5, times = 2)
##  [1] 1 2 3 4 5 1 2 3 4 5

Classes

Vectors (and othe R objects) can contain different data types (classes)

Numeric

v <- c(1, 2, 3, 4, 5)
class(v)
## [1] "numeric"

Character

char <- c("cat", "dog")
typeof(char)
## [1] "character"

Classes

Logical data can take only one of two values: TRUE or FALSE.

v <- c(1,2,3,4,5)

# Identify elements > 3 in numeric vector v:
logical <- v > 3

print(logical)
## [1] FALSE FALSE FALSE  TRUE  TRUE

Vector classes

  • all elements of a vector (are forced to) have the same class
class(num <- c(1, 2))
## [1] "numeric"

class(char <- c("cat", "dog"))
## [1] "character"

c(num, char)
## [1] "1"   "2"   "cat" "dog"

class(c(num, char))
## [1] "character"

Matrices

Matrix generating functions

  • matrix(data, nrow, ncol) generates a matrix
    • all elements (are forced to) have the same class
M <- matrix(data = 1:6, nrow = 2, ncol = 3)
M
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
class(M)
## [1] "matrix" "array"

cbind(...) collects vectors in a matrix as columns:

cbind(a = 1:2, b = c("cat", "dog"))
##      a   b    
## [1,] "1" "cat"
## [2,] "2" "dog"

rbind(...) collects vectors as rows:

rbind(a = 1:2, b = c("cat", "dog"))
##   [,1]  [,2] 
## a "1"   "2"  
## b "cat" "dog"

Data frames

Data frame generating functions

  • data.frame(...) collects vectors as variables in a data frame
    • variables can have different classes
df <- data.frame(x = 1:2, y = c("cat", "dog"), z = c(T, F))
df
##   x   y     z
## 1 1 cat  TRUE
## 2 2 dog FALSE
class(df)
## [1] "data.frame"

sapply(df, class)
##           x           y           z 
##   "integer" "character"   "logical"

Lists

List generating function

  • list(...) creates a list

    • can contain objects of any dimension and class
    • used for collecting output from R function (e.g. linear regression)
L <- list(v = c(1, 2), matrix = M, df = df, list(1:10))
class(L)
## [1] "list"
sapply(L, class)
## $v
## [1] "numeric"
## 
## $matrix
## [1] "matrix" "array" 
## 
## $df
## [1] "data.frame"
## 
## [[4]]
## [1] "list"

Factors

Factors

  • factor(...) makes / changes vector into factor

    • factors have levels
    • used for categorical variables in analyses (e.g. linear model)
animals <- rep(c("cat", "dog"), 4)
summary(animals)
##    Length     Class      Mode 
##         8 character character

factor(animals)
## [1] cat dog cat dog cat dog cat dog
## Levels: cat dog
summary(factor(animals))
## cat dog 
##   4   4

Assign names with names()

Use names() to assign names to elements in R objects.

For example to the elements of a list:

names(L) <-c("Vector", "Matrix", "Dataframe", "List")
L
$Vector
[1] 1 2

$Matrix
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

$Dataframe
  x   y     z
1 1 cat  TRUE
2 2 dog FALSE

$List
$List[[1]]
 [1]  1  2  3  4  5  6  7  8  9 10

Use of data objects

When to use the R data objects?

Object Use Why
data frame statistical analysis can store variables of any class
model formula statistical models, plots concise and readable, flexible, consistent across functions, packages
lists storage of output can store any object of any class
vectors/matrices programming can do fast calculations

Naming conventions, style guide

File and object naming

  • File names should be meaningful.
  • Avoid spaces in file names and use one of the naming conventions:
  1. snake_case: words are separated by underscores (_), and all letters are typically in lowercase. Examples: data_analysis.RData, my_data.csv.

  2. camelCase:: each word within a compound word is capitalized, except for the first word, and no spaces or underscores are used to separate the words. Examples: calculateMean, summaryStatistics.

  3. PascalCase: the first letter of each word in a compound word is capitalized, and no spaces or underscores are used to separate the words. Examples: DataAnalysis, DescriptiveStatistics.

Spacing and indentation

  • When indenting your code, use 2 spaces. RStudio does this for you!
  • Never use tabs or a mix of tabs and spaces.
  • Place spaces around all operators (=, +, -, <-). Use x <- 5 not x<-5


Exception: spaces around = are optional when passing parameters in a function call.

lm(age ~ bmi, data=boys)

or

lm(age ~ bmi, data = boys)

Commas and punctuation

  • Do not put spaces before commas, but always put a space after commas.
    • c(1, 2, 3)
  • For function arguments, follow the same rule.
    • sum(a = 1, b = 2)


Bad examples:

# No spaces around debug
if ( debug )  
  
# Needs a space after the comma   
x[1,]  

Comments

  • Use # for single-line comments and place them above the code they reference.
  • Keep comments concise and relevant.
# Read the msleep.csv data and save the data as msleep
msleep <- readr::read_csv("msleep.csv")